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A systematic approach is presented for fitting empirical expressions to data depending 
on two variables. The problem can also be described as the simultaneous fitting of a family 
of curves depending on a parameter. 

The proposed method reduces a surface fitting problem to that of fitting a few functions 
of one variable each. First, the surface is expressed in terms of these one-variable functions, 
and using an extension of two-way analysis of variance, the accuracy of this fit is assessed 
without having to determine, at this point, the nature of the one-variable functions. Then, 
the one- variable functions are fitted by customary curve-fitting procedures. 

For illustration, the method is applied to two sets of experimental data. 



1. Introduction 

A frequently occurring situation in scientific work 
is one in which the relationship between two quan- 
tities is examined for a series of values of a third 
quantity. For example, in the thermodynamic 
studies of gases the pressure-volume relationship 
may be examined at various temperatures. The 
results of such experiments are often presented in 
terms of a one-parameter family of curves. Alter- 
natively, one may describe the problem as the fitting 
of a surface in a space of three dimensions. 

An analysis of a set of data (or curves) of this type 
follows one of two possible lines: either a model is 
postulated on the basis of physicochemical hypoth- 
eses, in which case the main purpose of the analysis 
is to verify the adequacy of this model, and possibly 
to estimate certain constants occurring in the model; 
or there exists no pertinent theory, in which case the 
problem consists in rinding a satisfactory empirical 
representation of the data. Thus, in our example, 
one might postulate Van der Waals equation: 



(p+£)(V-b)=RT 



(1) 



where p. V, and T represent pressure, volume, and 
temperature, i?, the gas constant, and a and b two 
constants to be inferred from the data. The postu- 
lation of this equation would put the problem in 
the first category. On the other hand, the experi- 
menter may desire to determine the form of the 
equation that best represents his data, without 
committing himself to any specific preconceived 
equation such as (1). In that case, which consti- 
tutes a problem of the second category, the choice 
of a suitable equation may present considerable 
difficulties. There exist few, if any, guidelines to 
assist one in the selection, and trial and error is the 
only way by which a particular equation is finally 



chosen. A widely used statistical procedure for 
fitting curves and surfaces is the method of least 
squares. Application of this method requires that 
some specific functional form be agreed upon prior 
to the fitting process. This process serves to estimate 
the unknown parameters and to evaluate the ade- 
quacy of the fit in terms of the smallness of the 
residuals. There is no assurance, by this method, 
that a much better fit might not be achieved by 
an entirely different functional form. Also, if the 
fit turns out to be inadequate, the method of least 
squares yields little, if any, information regarding 
the direction in which one ought to search for a more 
appropriate model. 

In this paper, the empirical fitting of a family of 
curves is attacked in a systematic way. Mathe- 
matical expressions are used involving functions 
that depend each on one variable only. The nature 
of each of these functions is left entirely open in the 
initial fitting process, and the adequacy of the fit is 
judged without having to specify the nature of these 
[unctions. Thus, one need not estimate the values 
of any parameters before judging the success of the 
fit. 

The specific examples presented in this paper are 
used only to illustrate the mathematical approach 
and not to propose alternative equation of state, 
either for rubber or for ethylene. 

2. Generalized Model 

For the sake of clarity, we shall discuss the problem 
first in terms of eq (1). Rewriting eq (l)jas: 



^( vO + (v=0 t 



(2) 



we see that for any particular value of V, it repre- 
sents simply a linear relationship between p and T. 
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Thus, for any value of V, a plot can be made of p 
versus T, and a straight line fitted to the plotted 
points. If data are available for different values of 
V, this method will result in a collection of straight 
lines, one for each value of V. The slope of the 
straight line, corresponding to any given value of V, 

is y^T and the intercept is -™. Thus, by studying 

the relationship between the experimentally deter- 
mined slopes and the corresponding values of V, one 
can obtain an estimate of the parameter b. Similarly, 
from the intercepts an estimate of a can be obtained. 
So far, no new technique of analysis has been intro- 
duced, and the procedure is entirely contingent on 
the linearity of p in terms of T. Note, however, that 
in fitting each straight line, no use has been made 
of the fact that the slope depends on V in accordance 

with the function T r 7 or that the intercept is in- 
versely proportional to V 2 . It is only in the estima- 
tion of 6 and a that consideration has been given to 
these facts. 

Suppose, now, that the experimenter is not com- 
mitted to eq (2) as the only possible representation 
of his data, or that, in fact, he knows this equation to 
be unsatisfactory for that purpose. It is then possi- 
ble to suggest an immediate generalization of eq (2), 
far less restrictive than this equation, that may be 
more adequate as a representation of the data. 

We note that eq (2) belongs to the general class. 



P=f(V)+g (V)h(T) 



(3) 



where / and g are two distinct functions of volume 
only, while h is a function of temperature only. 
Equation (3) is more general than eq (2) in that no 
assumptions are made regarding the form of the func- 
tions /, g, and h. For example, h(T) may be a 
quadratic, or an exponential, or any other desired 
function of T. Nor is it necessary to assume that 

f(V) and g(V) obey the functional forms ^— — and 

tti required by Van der WaaPs equation. Any 

dependence of h on Tand of/ and g on V is admissible 
in the general formulation of eq (3) . 

We will adopt as our generalized model that repre- 
sented by eq (3). First we describe a method for 
fitting the model represented by eq (3) and for evalu- 
ating the adequacy of the fit. Then we illustrate 
the usefulness of this model by applying it to two sets 
of experimental data. 

3. Analysis of the Generalized Model 

Let the data be in the form of a rectangular array, 
in which each row is associated with a particular 
value of Vj and each column with a particular value 
of T. Each cell of the array then contains the value 
of pressure corresponding to the volume and temper- 
ature values represented by the row and column 
intersecting in that cell. Such an arrangement is 
shown in table 1. 



Table 1. Schematic of p- V-T data 
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The main difficulty in fitting eq (3) lies in our 
ignorance of the function h(T). Indeed, eq (3) ex- 
presses for any given value of V, a linear relation 
between p and h{T). If A(T) is known for each T, 
the straight line corresponding to each value of V 
can at once be plotted and the nature of the functions 
j(V) and g(V) can then be determined by studying 
the slopes and intercepts of the lines as functions of 
V. Let us note, however, that a similar analysis can 
be made as soon as we have a set of values linearly 
related to h( T) . For if a function H( T) is defined by 



H(T)=a+ph(T) 
eq (3) can be written 

p=A{V)+B{V)H(T) 



with 



A (yy--f(V)-- a g{V) 



B(V)= 



g(V) 



fi 



(4) 

(5) 
(6a) 

(6b) 



Then eq (5) also represents, as does eq (3), a linear 
relationship between p and H(T) for each value of 
V. If H(T) is known, the functions A(V) and B(V) 
may then be determined from the linear fits of p 
versus H(T), for different values of V. Now when 
h(T) is unknown, there exist nevertheless many 
functions H(T) the values of which can be inferred 
from the data for all T values represented in the 
table. One of these functions is given by the 
column averages p T of table 1. This follows at once 
by averaging both members of eq (3) over all rows, 
for any given value of T: 



p T =f+gh(T). 



(7) 



This function belongs indeed to the class of H(T) 
defined by eq (4). For reasons of statistical con- 
venience, a preferable choice is given by 



C T =p T — p 



(8) 
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where p is the grand average of all j> T values in the 
table, When H(T) is selected to be C T , as defined 
by eq (8), we will refer to the corresponding repre- 
sentation by eq (5) as the "standard form." Thus, 
the standard form is given by: 



pz=A v +B v O T 

where C T is defined by (8) and : 

A v =f(V)-£j£ g(V) 

By- - 



(9) 



(10a) 



(10b) 



From (10b) it follows that the average of B v over 
all rows is equal to unity. On the other hand, eq 
(8) shows that the average of the C T over all col- 
umns is equal to zero. Thus: 



B=l and C=0. 



(11) 



It is easily verified that these two conditions are 
necessary and sufficient for assuring that the repre- 
sentation is in the standard form. Therefore, a 
function of two variables, as represented in table 1, 
may be approximated, in the form of eq (9), by three 
single — variable functions. The function C T of tem- 
perature is first computed from the column averages 
of table 1 by eq (8). A linear fit of each row of the 
table versus C T then gives the values of the functions 
A v and B v as the intercepts and the slopes of the 
fitted lines. 

An analytical formula for the function of two 
variables may now be obtained by fitting empirical 
formulas to the curves C T versus T, A v versus V r , 
and B v versus V. 

4. Statistical Model 

So far we have not considered errors of measure- 
ment. Let us now assume that the experiment has 
been conducted in such a way that V and T are 
controlled and p is a measurement subject to experi- 
mental error. Then eq (3) becomes: 



V=XV)+g(V)h(T)+e' 



(3a) 



where e' is a random error of zero expectation. For 
greater generality, the first member in eq (3a) can 
be replaced by any suitable function of p. In work 
dealing with equations of state, such as pressure- 
volume-temperature relationships, it is customary to 
study the quantity pV. Replacing p by pV in the 
left-hand side of (3a) would visibly not change the 
functional nature of the right-hand side of this rela- 
tion and it would generally result in greater homoge- 
neity in the variance of the error term. Representing 
the measured quantity, or any appropriate function 
of it (as in this case pV) by Z VtT , we have the 
general relation 



Z r , T =f(V)+g(V)h(T) + e (12) 

which can be written in the standard form : 

Z VtT =A v +B v C T +e (13) 

where B—\ and C=0. Specifically, C T is defined by 



Cf — '^ T "J 



(14) 



where Z T is the column average for column T and Z 
the grand average in table 1, the cell entries of which 
are Z v , T . In regard to the errors, e, we will assume 
that they are normally and independently distributed 
constituting a sample from a normal population of 
zero mean, and variance equal to a 2 . 

Under these assumptions, the values Z v , t and Z T 
(from which the C T are calculated) are no longer 
statistically independent, nor are Z r and C T in- 
dependent. It has, however, been shown [l] 1 that 
the following analysis is not invalidated by this 
circumstance. 

5. Statistical Analysis 

Denote by m the number of rows of table 1 , and by 
n the number of its columns. For each row, a straight 
line is fitted to the set of points (Z, C) using the usual 
method of linear regression. This yields the esti- 
mates, 

Z^Zy, T 

■^r (is) 



A 

Ay 



By^^ 



/i^i^V, T^T 



T 



(16) 



and an estimate of the variance about the regression 
line : 



H[Z V ,T-(A V + B V C T )Y 



(17) 



Since the variance of e is assumed to be the same for 
all values of V, the m estimates given by eq (17) 
for the m values of Vmay be pooled. How this is to 
be done will be shown in the discussion on the 
analysis of variance. Note, however, that an 

A 

inspection of the m values of V(e) is of considerable 
interest, especially for the detection of trends 
related to the magnitude of V. A pooled value is 
meaningful only in the absence of such trends. 

A 

From (17) , or from a pooled value of V(e) , estimates 

A A 

of the standard errors of A v and B v may be obtained 
by the usual formulas. 



1 Figures in brackets indicate the literature references at the end of this paper. 
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6. Case of Concurrence 

Among the many possibilities for the structure of 
a family of curves, two special cases deserve partic- 
ular attention. The first concerns a family of "par- 
allel" curves. In this case, the straight lines result- 
ing from the application of the method described in 
this paper will also be parallel. Their slopes are then 
independent of V and all equal to unity so that the 
model reduces to the "additive" type. 



Z v T =A v +C T +e=A v +Z T -Z+e. 



(18) 



The second special case is that in which all the 
curves of the family pass through a common point. 
We denote this situation as the "concurrent" case. 
When the curves are concurrent, then so are the 
straight lines resulting from our analysis. Now a 
necessary and sufficient condition for a collection of 
straight lines of the type 



Z=f(V)+g(V)h(T) 



(19) 



to concur, is that a linear relation exist between f(V) 
and g(V). For if h(T ), Z are the coordinates of the 
common point, the following identity must hold for 
all V; 

Zo=f(V)+g(V)h(T ) 
or 

f(V)=Z -[k(T )]g(V). (20) 

This equation expresses a linear relation between /(F) 
and g(V), since Z and h(T ) are numerical constants. 
Conversely, if this linear relation holds, then the 
entire set of straight lines passes through the point 
[h(T ), Z ], and hence is concurrent. 

The importance of the concurrent model is that in 
it, the algebraic expression of the structure of the 
family of curves becomes quite simple. Indeed, re- 
placing in eq (19), the quantity f(V) by its expression 
given by eq (20), we obtain 



or 



Z=[Z -h(T )g(V)} + g(V)HT) 
Z-Z Q =g(V)[h(T)-h(T )}. 



(21) 



Thus, in the case of concurrence, the measured quan- 
tity is essentially the product of two functions, the 
first involving V only, and the second only T. 

We will show in the next section how the con- 
currence of a family of curves is revealed by the 
analysis of variance. 

7. Analysis of Variance 

The theoretical basis of the analysis of variance is 
discussed in reference [1]. The analysis is based on 
the standard form of the model, as given by eq (13), 
which can be rewritten as 



Z VtT =A v +C T +(B v -l)C T +e. 



(22) 



To each of the four terms in this expression cor- 
responds a sum of squares, computed as indicated 
in table 2a. 



Table 2a. Analysis of variance 



Term in 

eq (22) 


Degrees of 
freedom 


Sum of squares 


Mean square 


Av 
Ct 
(Bv-DCt 


m—1 
n-l 

m—1 

(rn-l)(n-2) 


SSA=m>vA*r 

SSBxc=2H/V-D 2 2rC 2 r 
SS«=Zt2t[Zf t-{Av+BCt)¥ 


SSa 
m—1 
SSc 
n-l 
SSsxc 
m—1 
SS £ 




(w-l)(n-2) 



It is seen from table 2a that the usual interaction 
term is here partitioned into two parts, (B V —1)C T 
and e. Thus, only (m—1) (n — 2) degrees of freedom 
are available for random error, the remaining (m—1) 
being allocated to the important "slope effect." 
The analysis thus provides an answer to the ques- 
tion of how the m estimates given by eq (17) are to 
be pooled: the total number of degrees of freedom 
for the pooled estimate is (m — l)(n—2) (rather than 
m(n—2), because of the correlation between the m 
separate estimates). The m—1 degrees of freedom 
corresponding to the term (B V —1)C T provide a 
means for testing the "parallelism" of the family of 
curves. In the case of parallelism the mean square 
corresponding to the (B V —1)C T term will not be 
significantly larger than the e mean square and the 
model underlying the set of curves becomes the 
simple additive model of ordinary analysis of 
variance. 

The existence of a point of concurrence is tested 
by a further partitioning of the interaction sum of 
squares. The test is based on the theorem proved 
in the preceding section that a necessary and suffi- 
cient condition of concurrence is the existence of an 
exact linear relation between /(F) and g(V) . In view 
of eqs (10a) and (10b), this implies a linear relation 
between A v and B v . But then the correlation be- 
tween these two quantities is unity. Consequently, 
the test for concurrence is carried out as follows. 
First, compute the correlation coefficient rjL % $ be- 
tween the quantities A v and B v . Then partition 
the (B v _i)C T term as shown in table 2b. If the 
mean square for concurrence is significant with 
respect to that for nonconcurrence and the latter is 
comparable in magnitude to the e mean square, 
there is good evidence that the family of curves 
pass through a common point. Of course, one can 

also plot the m points (A v , By) ; if an exact straight 
line (to within e error) results, there is concurrence 
in the family of curves. 

Table 2b. Test for concurrence 



Term in eq (22) 


Degrees of 
freedom 


Sum of squares 


Mean 
square 


(Bv-DCt 

Concurrence 
Nonconcurrence 


m—1 

1 
m-2 


■ 
SSsxc 

SScone=[SSi*c][r*£, $ 
SSn 0n conc=[SSBxc][l-»*££] 


SSbxc 
m—1 

SSooao 

StSnonconc 


m-2 
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8. Further Generalization of the Model 

Suppose that application of the proposed method 
to a particular one-parameter family of curves has 
been unsuccessful. In terms of eq (13), this would 
be shown by the failure to obtain straight-line 
relationships when Z v T is plotted versus C T , for 
particular values of V. A natural extension of the 
procedure is to try a model of the type 



Z/y t t — -o.^TiJ^CrTJL'r^T 



(23) 



that is, to fit a quadratic, rather than a linear rela- 
tion, to Z as a function of the column averages. If 
necessary, a polynomial of degree higher than two can 
be tried. Experience shows that the quadratic 
model represented by eq (23) may give very satis- 
factory results where the simpler linear model fails. 
For computational convenience, it is often advan- 
tageous to make the quadratic fit by the method of 
orthogonal polynomials in C T , despite the fact that 
the C T can, of course, not be expected to be equi- 
distant. The relative advantage of using orthogonal 
polynomials increases with the number of rows in 
the table, since all rows are fitted versus a constant 
set of polynomials in C T . For the quadratic model, 
the method of orthogonal polynomials yields the 
equation 



Z Vt T =A V +B V C T +D V [Q(C T )] 



(24) 



where A v and B v and C T are identical with the cor- 
responding quantities used in the linear fit, and 
Q{C T ) is defined by: 






(25) 



where n is the number of values of T (number of 
columns). The estimate of D v is given by 



A 7ZZv,AQ(C T )} 

Dv= Tmcr)? 

T 



(26) 



The improvement of the quadratic fit over the linear 
one can be assessed by the corresponding reduction 
in the sum of squares in the analysis of variance. 
Denoting the reduction in the sum of squares due 
to the quadratic term by SS^, we have: 



ss D = (j:d 2 v)jz\q{c t )\\ 

V T 



(27) 



The corresponding number of degrees of freedom is 
m— 1, where m represents the number of V values 
(number of rows). 

9. Application to the Compression of 
Vulcanized Rubber 

The data in table 3 are taken from a study of the 
compression of natural rubber-sulfur vulcanizates 



[3]. Tabulated are specific volume measurements 
for pressure values ranging from 1 to 10,000 atm 
over a temperature range extending from 20 to 80 
°C. The analysis was made using the program for 
the IBM 7090 computer, to be described in the 
last section. The analysis of variance is shown in 
table 4. This analysis corresponds to a fit of the 
data by the empirical formula 



V=A p +B p C T +e 



(28) 



where V is the measured specific volume, A P and 
B v are two functions of pressure, and C T is a func- 
tion of temperature. The symbol e represents an 
"error-term," including the effect of experimental 
error as well as that of any inadequacy of eq (28) 
to represent the data. It is seen that the standard 
deviation corresponding to this error term is 0.00094. 
Since the values of specific volume are all of the 
order of 0.85, the coefficient of variation of the 
error term is about 0.11 percent. 

Table 3. Specific volume of rubber 
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Table 4. 


Specific volume of rubber — analysis of 


variance 




Term in eq (28) » 


Degrees 


Sum of 


Mean 










of freedom 


squares 


square 






A v 




10 


0. 0878718 


0. 008787 






Ct 




4 


.0009200 


. 000230 






(Bpr 


-\)Ct 


10 


.(•()( 14084 
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e 




30 


. 0000204 


. 00000088 



a (Equation (28) may be written as follows: 

V=A p +Cr+(B v -l)Cr+*. 

The values of A p , B V1 and C T are listed in table 5. 
Their relation to pressure and temperature are 
shown in figures 1, 2, and 3. It is interesting to 
compare the results of this analysis with those of 
the conventional analysis of variance for a two-way 
table. In such an analysis, the effect of "slopes" 
would not have been separated from that of random 
interaction. Consequently, the trend shown in 
figure 2 would have been ignored; i.e., the curve in 
this figure would have been replaced by a hori- 
zontal straight line. The "error- term" would have 
been inflated by the trend of figure 2 and would have 
yielded a mean square of 12.37X10" 6 (the pooled 
mean square for the last two terms in table 4) 
corresponding to a standard deviation of error of 
0.0035, and a coefficient of variation of roughly 
0.4 percent. 
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Table 5. Specific volume of rubber — parameters 



Pressure 


A v 


B v 


Tempera- 
ture 


i 
Ct 


1 


0. 9494 


2. 7737 


21.0 


-0. 005370 


1000 


.9280 


2. 0484 


38.5 


-.003416 


2000 


.9058 


1. 2497 


50.2 


-.000191 


3000 


.8893 


0. 8905 


64.0 


+. 003306 


4000 


.8758 


.6899 


81.5 


+.005671 


5000 


.8641 


.6149 






6000 


.8537 


.6020 






7000 


.8444 


.5986 






8000 


.8357 


.5658 






9000 


. 8283 


. 4576 






10000 


.8208 


.5167 







.950 



i 1 1 i r 



i i i 



2 3 4 5 6 7 

PRESSURE ,l0 3 atm 



Figure 1. Compression of vulcanized rubber, parameter A. 



r 



i 1 1 1 1 i i i r 



2.6 


- 




















- 


2.2 




• 


















- 


1.8 






















- 


1.4 






• 
















- 


1.0 


- 






• 














- 


.6 










• 


• 


• 


• 


• 


• 


• 


? 




1 


1 


1 


1 


1 


1 


1 


1 


| 





01 23456789 10 

PRESSURE, I0 3 atm 

Figure 2. Compression of vulcanized rubber, parameter B. 
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Figure 3. Compression of vulcanized rubber, parameter C. 

By means of figures 1, 2, and 3, the effects of 
pressure and temperature on specific volume have 
been quantitatively separated. Figures 1 and 2 
represent the effect of pressure; by fitting empirical 
curves to these graphs, isotherms can be obtained 
for each of the temperatures included in the study. 
Figure 3 represents the effect of temperature. It 
exhibits a possible discontinuity of slope which, if 
real, would be interpreted as a so-called "glass 
transition/' 

In the next section we will discuss another appli- 
cation, for which an analytic expression will be 
derived to represent the data. 

10. Application to the Isotherms of Ethylene 

The data for this illustration are taken from a 
published study of the isotherms of ethylene [2], for 
temperatures between and 150 °C and pressures 
up to 3,000 atms. The data for °C were incom- 
plete. A complete rectangular array could be ex- 
tracted from the data, covering 6 values of tempera- 
ture (columns), and 40 values of density (rows). 
However, in order to demonstrate the capabilities of 
the proposed fitting process, only 13 densities were 
selected from this set. These data are shown in 
table 6: they were analyzed by the IBM 7090 pro- 
gram. An examination of the residuals revealed, 
however, a marked increase in variance with an in- 
crease in density. Therefore, the analysis was re- 
peated, after " weigh ting" the rows, representing 
densities, by an appropriate factor. This "weighting 
by rows" is a simple procedure. Let 



Z=pV=A d +B d O T +€ d)T 
and let the variance of e d , T be given by 



(29) 



(30) 
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Then, multiplying eq (29) by V^z we have: 

(VoQ (Z d t T ) = (-y/waAa) + (^o) d B d ) C T + (V^d, T ) . 
Table 6. Equation of state for ethylene a 



Density 




Temperature, ' 


C 




















25 


50 


75 


100 


125 


150 


19.0407 


0. 97365 


1.07743 


1.18010 


L 28137 


1. 38277 


1. 48309 


47. 875 


. 80607 


0. 92622 


L. 04361 


1. 15894 


1.27309 


1. 38615 


90. 841 


. 60885 


. 75053 


0. 88775 


1.02243 


1. 15528 


1.28704 


133. 083 


.47510 


. 63254 


. 78765 


0.94127 


1.09356 


1. 24486 


186. 001 


. 37578 


. 55506 


. 73693 


.91911 


1. 10127 


1. 28330 


205. 88 


. 35635 


. 54767 


. 74293 


. 93895 


1. 13521 


1.33117 


238. 60 


. 35108 


. 56984 


. 79304 


1.01706 


1.24101 


1.46466 


266. 25 


. 38459 


. 63473 


. 88799 


1.14112 


1.39357 


1.64500 


291.80 


. 46332 


. 74881 


1. 03528 


1. 32004 


1.60310 


1. 88420 


315. 34 


. 59374 


. 91664 


1. 23807 


1. 55592 


1. 87086 


2. 18230 


375. 30 


1.31315 


1.74911 


2. 17600 


2. 59290 


3. 00240 


3. 40630 


415.87 


2. 2661 


2. 7896 


3. 2984 


3. 7906 


4. 2745 


4. 7487 


437. 03 


2. 9648 


3. 5354 


4. 0890 


4. 6228 


5. 1463 


5. 6596 



» The tabulated value is v V. 



Denoting {^o) d )(Z dyT ) by ZJ t T we obtain 
A*. T == A d -\-B d C T J re dt T 

A%=^ d Aa 



where 



and 



B* d =^u d B d 



u d 



(31) 
(32a) 
(32b) 

(33) 



Thus, eq (31) now represents a family of curves with 
constant error-variance; the C T are redefined in 
terms of the Z% T and A d and B d are computed from 
A* and Bf using eqs (32). 

In the present case, the weights co rf were chosen in 
accordance with the relation 



U(C 



A 



(34) 



where A d is of course simply the average of all pV 
values in the row corresponding to density d. It 
follows from this choice and eq (32a), that the 
estimate of A^ is equal to unity for all values of d. 
Equation (31) was fitted to the data and gave a 
coefficient of variation of 0.3 percent. Since the 
data are believed to have a better precision than 
is indicated by this coefficient of variation, the fit- 
ting process was repeated, using the quadratic model : 



Zi. T=A* d +B* d C T +D* d [Q(C T )]+e* d 



(35) 



where Q{C T ) is defined by eq (25) and D% is esti- 
mated by a formula similar to eq (26). In terms 
of the unweighted data, the coefficient of the 
quadratic term is D d , where 



The analysis of variance is given in table 7. It 
should be noted that the latter is in terms of the 
weighted values, in accordance with eq (35). Thus, 
the residual variance is a measure of V(e*), not 
V(e). Furthermore, because A%=\ for all d, the 
mean square corresponding to this term is zero. 
From eqs (30) and (33) we infer that : 

0- e *=vWa"e 
which, in view of eq (34) becomes 



(36) 



Table 7. Equation of state for ethylene — analysis of variance 



Term in eq (35) 


Degrees of 
freedom 


Sum of 
squares 


Mean square 


K 


12 








C T 


5 


7. 53538 


1. 50708 


(ir d -i)c T 


12 


0. 70968 


0. 05914 


D* d [Q(Cr)] 


12 


. 0004263 


. 0000355 


e* 


36 


. 000017S 


. 000000494 


Residual error, us- 
ing eq (31) 


48 


0. 0004441 


0. 00000925 



D* d =^r d D d . 



(32c) 



Thus, o- £ * is roughly equal to the coefficient of varia- 
tion of pV. From the analysis of variance it is 
seen that this coefficient of variation is equal to 
approximately 0.3 percent using the simple model 
of the type of eq (13), and that it is reduced to 
about 0.07 percent when the more complicated model 
involving a term in C% is used. This model can be 
written 

Z d , T =A d +B d C T +D d [Q(C T )]+e 
or 

Z dtT =A d +B d C T +D d C 2 T +e. (37) 

The values of A' d , B' d , D d and C T resulting from the 
analysis are given in table 8. 

The analysis could be terminated at this point. 
Using table 8 and eq (37), a value of Z djT can be 
computed for any value of d and any value of T 
within the ranges of these variables covered by the 
data. This can be done by numerical interpolation 
carried out on the functions A' d , B d , D d , and C T . 

Table 8. Equation of state for ethylene — parameters 



Density 


A' 


B' 


D 


Temper- 
ature 


C 


19. 0407 


1. 229967 


0. 559646 


-0. 177346X1C-2 


25 


-0. 457050 


47. 875 


1. 100321 


. 636773 


-1.353445X10-2 


50 


-. 272819 


90. 841 


0. 954057 


. 744060 


-2. 150476X10- 2 


75 


-.089140 


133. 083 


. 863068 


. 845462 


-0. 591976X10-2 


100 


. 092803 


186. 001 


. 825883 


. 998293 


2.786081X10- 2 


125 


. 273447 


205. 8S 


.838570 


1. 072784 


3. 599158X10-2 


150 


. 452757 


238. 60 


. 902332 


1. 225498 


3. 916304X10-2 






266. 25 


1.011673 


1.386588 


2.926550X10-2 






291.80 


1. 174651 


1. 562422 


1. 179659X10-2 






315. 34 


1. 393917 


1.746246 


-1.032038X10-2 






375. 30 


2. 380897 


2. 298531 


-7.853346X10-2 






415. 87 


3. 540819 


2. 724986 


-13. 286168X10-2 






437. 03 


4. 351878 


2. 957346 


-16. 1097l4X10-» 
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To obtain a complete empirical representation of 
the data, one further step is required. The quantities 
A' d) B' d , and Z^must be expressed as calculable func- 
tions of the density d, and C T as a calculable function 
of the temperature T. This was done by fitting 
polynomial expressions to each of these functions, 
using the data in table 8. In particular, the quantity 
C T was satisfactorily fitted by 

C^Co+dT+CzT 2 . 

A reduction in the overall number of coefficients is 
achieved by introducing the quantity 



C" T - 









(38) 



Then, as can readily be verified, eq (37) can be 
written in the form 



Z d , T =A',l+B'aCr + D'^CrY+e. 



(39) 



It was found that satisfactory fits were obtained by 
using a fourth-decree polynomial for T)" and fifth- 
degree polynomials for A n and B n . The coefficients 
of the fitted polynomials are listed in table 9. 

Table 9. — Equation of state for ethylene coefficients of fitted 
polynomials a 



Degree of 

term in 

•polynomial 


A" 


B" 


D" 


C" 




1 


1. 03565 
-9. 33337X10-3 

4. 43017X10-5 
-1. 23357 XI 0-' 

8.03331X10-H 

2. 71884X10-13 


3,37538X10-3 

4. 591W X 10-5 

-3. 95037X10-7 

9. 77568X10-10 

-2.03756X10-13 

-9. 87817X10- '6 


1. 75818X10-6 
-1.07282X10-7 

1. 13865X10-9 
-3.71113X10 12 

3. 51:256X10-15 



1 


2 


-1.406662X10-4 


3 




4 

5 





*For A", B", and B" the polynomials are in terms of the density d; for C", 
the oolynomial is in terms of the temperature T. The equation fitted to the 
data is pV=A"+B"C"A-D"(C"y. 

Using these functions, "calculated" values (de- 
noted as Z diT ) are obtained for Z diT according to the 
equation 



A 



--A' d '+B' d 'C"^+D' d '(C^y 



(40) 



AAA A 

in which A", B" , D" , and G" are given by the 
polynomials whose coefficients are listed in table 9. 

Values of Z d , T for the thirteen densities and six tem- 
peratures are given in table 10. A comparison of 
these values with those of table 6 shows that 90 per- 
cent of the fitted values agree with the observed data 
to within 0.5 percent or better, and that of the re- 
maining ones, all but two agree to within 1 percent. 
The largest relative deviation is 1.23 percent. 

The fitting procedure has therefore been very 
successful for these data. Since all the data are 
fitted by a single algebraic expression, interpolation 
for either pressures or temperatures not used in the 
fit should be accurate. To test this point, eq (40) 
was used for interpolation at densities not used in 
the fitting procedure. It ma}^ be recalled that the 



data used for the fit constituted a selection of 13 
densities from a total available set of 40 densities. 
Values of pV were now calculated for all six tempera- 
tures and the following additional densities: 111.849, 
153.349, 221.48, 245.75, 310.08, 355.43, and 456.85. 
This last value is outside the range covered by the 
fit and involves therefore an extrapolation process. 
The remaining six densities involve only interpola- 
tion. Thus, the fitted surface was tested for 42 
individual values by interpolation or extrapolation. 
The results showed that for 35 of these 42 values, 
the difference "observed minus fitted " was less than 
0.5 percent of the observed value. All but three of 
these differences were smaller than 1 percent of the 
observed values. The largest difference was equal 
to 1.21 percent of the observed value. Thus, the 
values obtained bj interpolation are of the same 
order of precision as those directly fitted. This 
appears to be generally true for the procedure pro- 
posed in this paper, provided that the fits used for 
the single-variable functions A, B, C, and D are all 
of sufficient accuracy. 

Table 10. Equation of state for ethylene calculated values 



Density 




Temperature, c 


C 








25 


50 


75 


100 


125 


150 


19. 0407 


0. 97650 


1. 07924 


1. 18138 


1. 28291 


1. 38382 


1. 48413 


47. 875 


. 80066 


0. 92173 


1.04053 


1.15710 


1.27147 


1. 38367 


90. 841 


. 61040 


. 75035 


0. 88793 


1.02317 


1.15611 


1.28676 


133. 083 


. 48047 


. 63589 


. 79020 


0. 94342 


1.09554 


1. 24656 


186. 001 


. 37525 


. 55591 


. 73715 


. 91892 


1.10118 


1. 28389 


205. 88 


. 35326 


. 54715 


. 74194 


. 93757 


1.13401 


1.33119 


238. 60 


. 34<>75 


. 56859 


. 79126 


1.01471 


1 . 23889 


1 . 46376 


266. 25 


. 38248 


. 63477 


. 88716 


1. 13962 


1.39210 


1. 64457 


291. 80 


. 46452 


. 75090 


1. 03614 


1.32021 


1. 60311 


1.88481 


315.34 


. 59742 


. 92027 


1. 24037 


1. 55772 


1 . S7233 


2. 18421 


375. 30 


1.31369 


1. 74885 


2. 17557 


2. 59397 


3.00415 


3. 40623 


415.87 


2. 26312 


2. 78469 


3. 29348 


3. 78971 


4. 27354 


4. 74519 


437. 03 


2. 96876 


3. 53605 


4. 08850 


4. 62635 


5. 14983 


5. 65916 



It is interesting to compare the results of this fit 
with the equally empirical fitting process used hj 
Michels and Geldermans [2]. These authors fitted 
each isotherm individually, requiring a total of 42 
coefficients for the six isotherms, as contrasted with 
the 18 coefficients (listed in table 9) required by the 
present procedure. The residuals obtained by 
Michels and Geldermans are somewhat smaller than 
those obtained by the present fit. On the other 
hand, the procedure used in this paper leads to a 
single algebraic expression to fit the entire surface. 
Differentiation is possible both with respect to 
density and temperature whereas Michels and 
Geldermans' fit does not allow for differentiation 
with respect to temperature. 

1 1 . Computer Program 

A program has been written to fit data to the 
linear or quadratic models on the IBM 7090 com- 
puter. The program was written in Fortran. The 
original data, fitted parameters, residuals resulting 
from the fitting procedures, and analysis of variance 
are printed. Row or column weighting may be used. 
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Provision is also made for transforming the data, for 
combining rows or columns of the data, for applying 
specified corrections to individual data, for reversing 
rows and columns of the data, and for omitting 
specified rows and columns from the original set of 
data. The rows and columns of the data are 
identified by alphabetical or numerical labels so the 
output is easily interpreted without any coding. 

12. Further Generalizations 

Measurements dependent on two variables are not 
always given in the form of a complete two-way 
array, such as table 1. It is often possible, in such 
cases, to construct such a table by interpolation or 
curve fitting procedures carried out on subsets of the 
data within which one of the two variables is held 
constant. 

The presentation in this paper has been in terms 
of one-parameter families of curves. The method 
€an, however, be used for the analysis of families of 
•curves involving more than one parameter. Applica- 
tions of this type are now being made. 

13. Summary 

A systematic method has been presented for the 
empirical fitting of data depending on two variables. 
Essentially, the method reduces the fitting of sur- 
faces to that of functions of single variables. In the 
basic model these single-variable functions are com- 
pletely arbitrary, allowing for great flexibility in 
applying the method. The adequacy of the model 
can be evaluated without having to introduce alge- 
braic expressions for the single-variable functions. 
To obtain a complete algebraic representation of the 
surface, it is then merely necessary to fit the single- 
variable functions by any appropriate method. 

In certain cases it may be desirable to omit this 
last step, and still retain a workable model which 



will express the surface in terms of tabulated func- 
tions of single-variables. In that case, numerical 
interpolation methods must be applied to these 
tabulated values. 

The first example used to illustrate the method 
deals with the effects of pressure and temperature on 
the specific volume of certain types of rubber. A 
quantitative separation of these effects was obtained 
in terms of tabulated values of three single-variable 
functions. The fit by means of these functions was 
within experimental error. 

A second example concerned the equation of state 
of ethylene. The entire set of data was represented 
by a single algebraic expression and a good fit was 
obtained. Eighteen coefficients were required by 
this fit, as against 42 coefficients necessitated by the 
procedure commonly used for data of this type. 

The statistical analysis required for the applica- 
tion of the proposed procedure is presented. In 
addition to providing estimates for the parameters of 
the model, the analysis allows for testing the sig- 
nificance of the pertinent effects. 
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